Goto

Collaborating Authors

 answer question


The Pentagon is planning for AI companies to train on classified data, defense official says

MIT Technology Review

The generative AI models used in classified environments can answer questions but don't currently learn from the data they see. The Pentagon is discussing plans to set up secure environments for generative AI companies to train military-specific versions of their models on classified data, has learned. AI models like Anthropic's Claude are already used to answer questions in classified settings; applications include analyzing targets in Iran. But allowing models to train on and learn from classified data would be a new development that presents unique security risks. It would mean sensitive intelligence like surveillance reports or battlefield assessments could become embedded into the models themselves, and it would bring AI firms into closer contact with classified data than before. Training versions of AI models on classified data is expected to make them more accurate and effective in certain tasks, according to a US defense official who spoke on background with .



Even Realities G2 Review: Smarter Glasses

WIRED

These second-generation smart glasses give you superpowers--if the software behaves. Optional R1 smart ring makes control simpler. Software stability remains an issue. Smart ring is an extra $249. Navigation needs a little finessing.


RECKONING: Reasoning through Dynamic Knowledge Encoding

Neural Information Processing Systems

Recent studies on transformer-based language models show that they can answer questions by reasoning over knowledge provided as part of the context (i.e., in-context reasoning). However, since the available knowledge is often not filtered for a particular question, in-context reasoning can be sensitive to distractor facts, additional content that is irrelevant to a question but that may be relevant for a different question (i.e., not necessarily random noise). In these situations, the model fails todistinguish the necessary knowledge to answer the question, leading to spurious reasoning and degraded performance. This reasoning failure contrasts with the model's apparent ability to distinguish its contextual knowledge from all the knowledge it has memorized during pre-training. Following this observation, we propose teaching the model to reason more robustly by folding the provided contextual knowledge into the model's parameters before presenting it with a question. Our method, RECKONING, is a bi-level learning algorithm that teaches language models to reason by updating their parametric knowledge through back-propagation, allowing them to answer questions using the updated parameters.


FHIR-AgentBench: Benchmarking LLM Agents for Realistic Interoperable EHR Question Answering

Lee, Gyubok, Bach, Elea, Yang, Eric, Pollard, Tom, Johnson, Alistair, Choi, Edward, jia, Yugang, Lee, Jong Ha

arXiv.org Artificial Intelligence

The recent shift toward the Health Level Seven Fast Healthcare Interoperability Resources (HL7 FHIR) standard opens a new frontier for clinical AI, demanding LLM agents to navigate complex, resource-based data models instead of conventional structured health data. However, existing benchmarks have lagged behind this transition, lacking the realism needed to evaluate recent LLMs on interoperable clinical data. To bridge this gap, we introduce FHIR-AgentBench--a benchmark that grounds 2,931 real-world clinical questions in the HL7 FHIR standard. Using this benchmark, we systematically evaluate agentic frameworks, comparing different data retrieval strategies (direct FHIR API calls vs. specialized tools), interaction patterns (single-turn vs. multi-turn), and reasoning strategies (natural language vs. code generation). Our experiments highlight the practical challenges of retrieving data from intricate FHIR resources and the difficulty of reasoning over them--both of which critically affect question answering performance.


Semantic World Models

Berg, Jacob, Zhu, Chuning, Bao, Yanda, Durugkar, Ishan, Gupta, Abhishek

arXiv.org Artificial Intelligence

Planning with world models offers a powerful paradigm for robotic control. Conventional approaches train a model to predict future frames conditioned on current frames and actions, which can then be used for planning. However, the objective of predicting future pixels is often at odds with the actual planning objective; strong pixel reconstruction does not always correlate with good planning decisions. This paper posits that instead of reconstructing future frames as pixels, world models only need to predict task-relevant semantic information about the future. For such prediction the paper poses world modeling as a visual question answering problem about semantic information in future frames. This perspective allows world modeling to be approached with the same tools underlying vision language models. Thus vision language models can be trained as "semantic" world models through a supervised finetuning process on image-action-text data, enabling planning for decision-making while inheriting many of the generalization and robustness properties from the pretrained vision-language models. The paper demonstrates how such a semantic world model can be used for policy improvement on open-ended robotics tasks, leading to significant generalization improvements over typical paradigms of reconstruction-based action-conditional world modeling. Website available at https://weirdlabuw.github.io/swm.



Can Large Language Models Bridge the Gap in Environmental Knowledge?

Smail, Linda, Calonge, David Santandreu, Kamalov, Firuz, Orak, Nur H.

arXiv.org Artificial Intelligence

The investigation employs a standardized tool, the Environmental Knowledge Test (EKT - 19), supple mented by targeted questions, to evaluate the environmental knowledge of university students in comparison to the responses generated by the AI models. The results of this study suggest that while AI models possess a vast, readily accessible, and valid kno wledge base with the potential to empower both students and academic staff, a human discipline specialist in environmental sciences may still be necessary to validate the accuracy of the information provided. Keywords: En vironmental Education; AI Models; EKT - 19 1. Introduction Extreme weather events, increasing global temperatures, rising sea - levels, and changes to ecosystems and biodiversity are all consequences of climate change, which is mostly caused by anthropogenic greenhouse gas emissions ( Masson - Delmotte et al., 2018). Meanwhile, the loss of biodiversity due to habitat degradation, pollution, overexploitation, and invasive species threatens the resilience of society's ecosystems (Nature, 2021). These consequences pose questions regarding food security, public he alth, and socioeconomic stability. Thus, effective access to accurate environmental knowledge is crucial for developing sustainable solutions and informed environmental policies.


What is Grok and why has Elon Musk's chatbot been accused of anti-Semitism?

Al Jazeera

Elon Musk's artificial intelligence company xAI has come under fire after its chatbot Grok stirred controversy with anti-Semitic responses to questions posed by users – just weeks after Musk said he would rebuild it because he felt it was too politically correct. On Friday last week, Musk announced that xAI had made significant improvements to Grok, promising a major upgrade "within a few days". Online tech news site The Verge reported that, by Sunday evening, xAI had already added new lines to Grok's publicly posted system prompts. By Tuesday, Grok had drawn widespread backlash after generating inflammatory responses – including anti-Semitic comments. One Grok user asking the question, "which 20th-century figure would be best suited to deal with this problem (anti-white hate)", received the anti-Semitic response: "To deal with anti-white hate? Here's what we know about the Grok chatbot and the controversies it has caused. Grok, a chatbot created by xAI – the AI company Elon Musk ...


Structured Attention Matters to Multimodal LLMs in Document Understanding

Liu, Chang, Chen, Hongkai, Cai, Yujun, Wu, Hang, Ye, Qingwen, Yang, Ming-Hsuan, Wang, Yiwei

arXiv.org Artificial Intelligence

Document understanding remains a significant challenge for multimodal large language models (MLLMs). While previous research has primarily focused on locating evidence pages through precise multimodal queries, our work investigates a fundamental yet overlooked aspect: how input format influences document comprehension performance. Through systematic analysis, we discover that raw OCR text often impairs rather than improves MLLMs' performance, which is a counterintuitive finding we attribute to attention dispersion and structure loss. To further substantiate our hypothesis, we propose a novel structure-preserving approach that encodes document elements using the LaTex paradigm, maintaining the hierarchical organization and spatial relationships critical for comprehension. Our attention analysis reveals that structured text induces structured attention patterns on both textual and visual content, directing models to focus on semantically meaningful regions while reducing attention waste. This approach significantly enhances MLLMs' document question answering performance across diverse document types without requiring architectural modifications or additional training.